{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型导出\n",
    "\n",
    "\n",
    "为了将训练好的机器学习模型部署到各个目标平台（如服务器、移动端、嵌入式设备和浏览器等），我们的第一步往往是将训练好的整个模型完整导出（序列化）为一系列标准格式的文件。在此基础上，我们才可以在不同的平台上使用相对应的部署工具来部署模型文件。TensorFlow 提供了统一模型导出格式 SavedModel，使得我们训练好的模型可以以这一格式为中介，\n",
    "\n",
    "\n",
    "Checkpoint可以帮助我们保存和恢复模型中参数的权值。而作为模型导出格式的 SavedModel 则更进一步，其包含了一个 TensorFlow 程序的完整信息：不仅包含参数的权值，还包含计算的流程（即计算图）。当模型导出为 SavedModel 文件时，无须模型的源代码即可再次运行模型，这使得 SavedModel 尤其适用于模型的分享和部署。后文的 TensorFlow Serving（服务器端部署模型）、TensorFlow Lite（移动端部署模型）以及 TensorFlow.js 都会用到这一格式。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tf.saved_model.save(model, \"保存的目标文件夹名称\")\n",
    "#在需要载入 SavedModel 文件时，使用\n",
    "model = tf.saved_model.load(\"保存的目标文件夹名称\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 对于使用继承 tf.keras.Model 类建立的 Keras 模型 model ，使用 SavedModel 载入后将无法使用 model() 直接进行推断，而需要使用 model.call() 。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#导出模型到 saved/1 文件夹：\n",
    "import tensorflow as tf\n",
    "from zh.model.utils import MNISTLoader\n",
    "\n",
    "num_epochs = 1\n",
    "batch_size = 50\n",
    "learning_rate = 0.001\n",
    "\n",
    "model = tf.keras.models.Sequential([\n",
    "    tf.keras.layers.Flatten(),\n",
    "    tf.keras.layers.Dense(100, activation=tf.nn.relu),\n",
    "    tf.keras.layers.Dense(10),\n",
    "    tf.keras.layers.Softmax()\n",
    "])\n",
    "\n",
    "data_loader = MNISTLoader()\n",
    "model.compile(\n",
    "    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),\n",
    "    loss=tf.keras.losses.sparse_categorical_crossentropy,\n",
    "    metrics=[tf.keras.metrics.sparse_categorical_accuracy]\n",
    ")\n",
    "model.fit(data_loader.train_data, data_loader.train_label, epochs=num_epochs, batch_size=batch_size)\n",
    "tf.saved_model.save(model, \"saved/1\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#将 saved/1 中的模型导入并测试性能：\n",
    "import tensorflow as tf\n",
    "from zh.model.utils import MNISTLoader\n",
    "\n",
    "batch_size = 50\n",
    "\n",
    "model = tf.saved_model.load(\"saved/1\")\n",
    "data_loader = MNISTLoader()\n",
    "sparse_categorical_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()\n",
    "num_batches = int(data_loader.num_test_data // batch_size)\n",
    "for batch_index in range(num_batches):\n",
    "    start_index, end_index = batch_index * batch_size, (batch_index + 1) * batch_size\n",
    "    y_pred = model(data_loader.test_data[start_index: end_index])\n",
    "    sparse_categorical_accuracy.update_state(y_true=data_loader.test_label[start_index: end_index], y_pred=y_pred)\n",
    "print(\"test accuracy: %f\" % sparse_categorical_accuracy.result())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用继承 tf.keras.Model 类建立的 Keras 模型同样可以以相同方法导出，唯须注意 call 方法需要以 @tf.function 修饰，以转化为 SavedModel 支持的计算图，代码如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        self.flatten = tf.keras.layers.Flatten()\n",
    "        self.dense1 = tf.keras.layers.Dense(units=100, activation=tf.nn.relu)\n",
    "        self.dense2 = tf.keras.layers.Dense(units=10)\n",
    "\n",
    "    @tf.function\n",
    "    def call(self, inputs):         # [batch_size, 28, 28, 1]\n",
    "        x = self.flatten(inputs)    # [batch_size, 784]\n",
    "        x = self.dense1(x)          # [batch_size, 100]\n",
    "        x = self.dense2(x)          # [batch_size, 10]\n",
    "        output = tf.nn.softmax(x)\n",
    "        return output\n",
    "\n",
    "model = MLP()\n",
    "..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#模型导入并测试性能的过程也相同，唯须注意模型推断时需要显式调用 call 方法，即使用：\n",
    "y_pred = model.call(data_loader.test_data[start_index: end_index])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### tf.function ：图执行模式 \n",
    "\n",
    "\n",
    "虽然默认的即时执行模式（Eager Execution）为我们带来了灵活及易调试的特性，但在特定的场合，例如追求高性能或部署模型时，我们依然希望使用 TensorFlow 1.X 中默认的图执行模式（Graph Execution），将模型转换为高效的 TensorFlow 图模型。此时，TensorFlow 2 为我们提供了 tf.function 模块，结合 AutoGraph 机制，使得我们仅需加入一个简单的 @tf.function 修饰符，就能轻松将模型以图执行模式运行。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import time\n",
    "from zh.model.mnist.cnn import CNN\n",
    "from zh.model.utils import MNISTLoader\n",
    "\n",
    "num_batches = 1000\n",
    "batch_size = 50\n",
    "learning_rate = 0.001\n",
    "data_loader = MNISTLoader()\n",
    "model = CNN()\n",
    "optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)\n",
    "\n",
    "@tf.function\n",
    "def train_one_step(X, y):    \n",
    "    with tf.GradientTape() as tape:\n",
    "        y_pred = model(X)\n",
    "        loss = tf.keras.losses.sparse_categorical_crossentropy(y_true=y, y_pred=y_pred)\n",
    "        loss = tf.reduce_mean(loss)\n",
    "        # 注意这里使用了TensorFlow内置的tf.print()。@tf.function不支持Python内置的print方法\n",
    "        tf.print(\"loss\", loss)\n",
    "    grads = tape.gradient(loss, model.variables)    \n",
    "    optimizer.apply_gradients(grads_and_vars=zip(grads, model.variables))\n",
    "\n",
    "start_time = time.time()\n",
    "for batch_index in range(num_batches):\n",
    "    X, y = data_loader.get_batch(batch_size)\n",
    "    train_one_step(X, y)\n",
    "end_time = time.time()\n",
    "print(end_time - start_time)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "运行 400 个 Batch 进行测试，加入 @tf.function 的程序耗时 35.5 秒，未加入 @tf.function 的纯即时执行模式程序耗时 43.8 秒。可见 @tf.function 带来了一定的性能提升。一般而言，当模型由较多小的操作组成的时候， @tf.function 带来的提升效果较大。而当模型的操作数量较少，但单一操作均很耗时的时候，则 @tf.function 带来的性能提升不会太大。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## TensorFlow Serving"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当我们将模型训练完毕后，往往需要将模型在生产环境中部署。最常见的方式，是在服务器上提供一个 API，即客户机向服务器的某个 API 发送特定格式的请求，服务器收到请求数据后通过模型进行计算，并返回结果。如果仅仅是做一个 Demo，不考虑高并发和性能问题，其实配合 Flask 等 Python 下的 Web 框架就能非常轻松地实现服务器 API。不过，如果是在真的实际生产环境中部署，这样的方式就显得力不从心了。这时，TensorFlow 为我们提供了 TensorFlow Serving 这一组件，能够帮助我们在实际生产环境中灵活且高性能地部署机器学习模型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#使用docker安装\n",
    "\n",
    "#TensorFlow Serving 模型部署\n",
    "'TensorFlow Serving 可以直接读取 SavedModel 格式的模型进行部署。使用以下命令即可：'\n",
    "tensorflow_model_server \\\n",
    "    --rest_api_port=端口号（如8501） \\\n",
    "    --model_name=模型名 \\\n",
    "    --model_base_path=\"SavedModel格式模型的文件夹绝对地址（不含版本号）\"\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
